This notebook is to explore Canadian SARS-CoV-2 genomic and epidemiological data, for discussion with pillar 6’s team and for sharing with collaborators. These analyses can spur further research within or across pillars, be used for reports (or data dashboards), given to the science communication pillar for public dissemination, and the code can be repackaged to give to public health authorities for their internal use.
Canadian genomic and epidemiological data will be regularly pulled from various sources to keep these analyses up-to-date.
## 1. LOAD processed metadata of Canadian sequences (with latest pangolin, division, and full seq IDs)
metaCANall <- read.csv("./GISAID/metadata_CANall_2021-09-12_edited.csv")
metaCANall$Collection.date <- as.Date(metaCANall$Collection.date)
#max(metaCANall$Collection.date) - min(metaCANall$Collection.date) #time diff: 580 days
#make a pango.group column (combines results from gisaid and my pangolin run):
metaCANall$pango.group <- metaCANall$scorpio.call
#fill all empty/na cells
metaCANall$pango.group[metaCANall$pango.group == ""]<-NA
#rename some of the groups:
metaCANall$pango.group<-str_replace_all(metaCANall$pango.group, fixed("B.1.1.7-like+E484K"), "Alpha (B.1.1.7-like) +E484K" )
metaCANall$pango.group<-str_replace_all(metaCANall$pango.group, fixed("B.1.617.1-like"), "Kappa (B.1.617.1-like)" )
metaCANall$pango.group<-str_replace_all(metaCANall$pango.group, fixed("B.1.621-like"), "Mu (B.1.621-like)")
metaCANall$pango.group<-str_replace_na(metaCANall$pango.group, "other")
## 2. LOAD epidemiological data (PHAC)
epidataCANall <- read.csv("./epidata/CDNepidata_covid19-download_PHAC_05-10-21.csv")
#from: https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html?stat=num&measure=total&map=pt#a2
Note that in this demo almost all of Ontario’s sequences were not included due to the lack of complete dates in GISAID. Moving forward, using VirusSeq Portal data, we should have better coverage in Ontario.
Sequence counts for all Canadian data in GISAID, up to August 25th, 2021, shows that by end of summer Alpha and Gamma were still the most sequenced variants. Note that some of the “other” category include some Delta sublineages (AYs) but overall, from the beginning of the pandemic to the fall of 2021, Canadian sequences were mostly of the wildtype lineages (pre-VOCs).
# --- Histogram plot: counts per variant of all canadian data -------------
df1 <- metaCANall %>% group_by(pango.group) %>% mutate(count_name_occurr =n()) #count the occurances by names
#plot
ggplot(df1, aes(x=reorder(pango.group, -count_name_occurr), fill=type))+
geom_bar(stat="count", fill = "steelblue", color="steelblue4")+
theme_classic()+ ylab("count")+xlab("")+
#ggtitle("Sequence counts all Canada per variant in GISAID - up to 2021-08-25")+
theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust = 1), axis.text = element_text(size = 12),
axis.title.x = element_text(size=12), axis.title.y = element_text(size = 12))
#------------- counts over time
#plot
ggplot(metaCANall, aes(x=Collection.date, fill=pango.group))+geom_bar(stat = "count")+
theme_classic() #+ggtitle("Canadian data in GISAID - pulled Sep 12, 2021")
There are two PANGO lineages that have a Canadian origin and have predominately spread within Canada (with some exportations internationally): B.1.438.1 and B.1.1.176.
Other lineages of Canadian interest:
Down-sampled Canadian SARS-CoV-2 genomes. Taken from GISAID Sept. 12th, 2021. Alignment GISAID, ML tree in IQ-tree. This tree was generated using TreeTime and visualized in ggtree.
This tree shows several features of VOC spread in Canada:
These last two points are suggestive of strain (variant) replacement, i.e. competitive exclusion, but more detailed analyses would be required to clarify this.
Divergence tree from TreeTime run, visualized in ggtree.
There are various methods to investigate changes in evolutionary rates of VOC/VOIs and compare their relative fitness in an epidemiological context.
For example, root-to-tip plots give estimates of substitution rates. Clusters with stronger positive slopes than the average SARS-CoV-2 dataset, are an indication that they are accumlating mutations at a faster pace, or clusters that jump up above the average could inidcate a mutational jump (simlar to what we saw with Alpha when it first appeared in the UK).
ML IQ-tree run in Tempest for root-to-tip data. Linear regression and plotting in R.
Overall, the evolutionary rate of the Canadian data (0.000875 subs/site/year, grey line) is very close to the global average of 24.162 subs/year (= 0.000822 subs/site/year). Delta, Gamma, and Alpha are all slower than this global average.
Add here:
Are there any BEAST analyses we’d like to do? e.g. infer R0, serial interval, etc for the different Delta sublineages (might have to restrict this geographically to specific PTs with enough coverage)
The version numbers of all packages in the current environment as well as information about the R install is reported below.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Canada.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] viridis_0.6.1 viridisLite_0.4.0 colorspace_2.0-2 RColorBrewer_1.1-2
## [5] kableExtra_1.3.4 gridExtra_2.3 ggbeeswarm_0.6.0 DT_0.19
## [9] cowplot_1.1.1 ggtree_3.0.4 phytools_0.7-90 maps_3.4.0
## [13] phangorn_2.7.1 tidytree_0.3.5 phylotools_0.2.2 ape_5.5
## [17] treeio_1.16.2 lubridate_1.7.10 reticulate_1.22 knitr_1.36
## [21] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [25] readr_2.0.2 tidyr_1.1.4 tibble_3.1.4 ggplot2_3.3.5
## [29] tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] ellipsis_0.3.2 fs_1.5.0 aplot_0.1.1
## [4] rstudioapi_0.13 farver_2.1.0 fansi_0.5.0
## [7] xml2_1.3.2 codetools_0.2-18 mnormt_2.0.2
## [10] jsonlite_1.7.2 broom_0.7.9 dbplyr_2.1.1
## [13] png_0.1-7 compiler_4.1.1 httr_1.4.2
## [16] backports_1.2.1 assertthat_0.2.1 Matrix_1.3-4
## [19] fastmap_1.1.0 lazyeval_0.2.2 cli_3.0.1
## [22] htmltools_0.5.2 tools_4.1.1 igraph_1.2.6
## [25] coda_0.19-4 gtable_0.3.0 glue_1.4.2
## [28] clusterGeneration_1.3.7 fastmatch_1.1-3 Rcpp_1.0.7
## [31] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.3.8
## [34] svglite_2.0.0 nlme_3.1-153 xfun_0.26
## [37] rvest_1.0.1 lifecycle_1.0.1 MASS_7.3-54
## [40] scales_1.1.1 hms_1.1.1 parallel_4.1.1
## [43] expm_0.999-6 yaml_2.2.1 ggfun_0.0.4
## [46] yulab.utils_0.0.2 sass_0.4.0 stringi_1.7.4
## [49] highr_0.9 plotrix_3.8-2 rlang_0.4.11
## [52] pkgconfig_2.0.3 systemfonts_1.0.2 evaluate_0.14
## [55] lattice_0.20-45 labeling_0.4.2 patchwork_1.1.1
## [58] htmlwidgets_1.5.4 tidyselect_1.1.1 magrittr_2.0.1
## [61] R6_2.5.1 generics_0.1.0 combinat_0.0-8
## [64] DBI_1.1.1 pillar_1.6.3 haven_2.4.3
## [67] withr_2.4.2 scatterplot3d_0.3-41 modelr_0.1.8
## [70] crayon_1.4.1 utf8_1.2.2 tmvnsim_1.0-2
## [73] tzdb_0.1.2 rmarkdown_2.11 grid_4.1.1
## [76] readxl_1.3.1 reprex_2.0.1 digest_0.6.28
## [79] webshot_0.5.2 numDeriv_2016.8-1.1 gridGraphics_0.5-1
## [82] munsell_0.5.0 beeswarm_0.4.0 ggplotify_0.1.0
## [85] vipor_0.4.5 bslib_0.3.0 quadprog_1.5-8